Variable Selection in Logistic Regression: The British English Dative Alternation

نویسنده

  • Daphne Theijssen
چکیده

In this paper, we address the problem of selecting the ‘optimal’ variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two (equally grammatical) syntactic constructions to express the same meaning. With the help of 29 explanatory variables taken from the literature, we build two types of models: (1) with the verb sense included as a random effect (verb senses often have a bias towards one of the two variants), and (2) without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by sequentially adding the most predictive variable (forward regression), and by sequentially removing the least predictive variable (backward regression). Seeing that the six approaches lead to five different models, we advise researchers to be careful to base their conclusions solely on the one ‘optimal’ model they found.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating automatic annotation: automatically detecting and enriching instances of the dative alternation

In this article, we automatically create two large and richly annotated data sets for studying the English dative alternation. With an intrinsic and an extrinsic evaluation, we address the question of whether such data sets that are obtained and enriched automatically are suitable for linguistic research, even if they contain errors. The extrinsic evaluation consists of building logistic regres...

متن کامل

Choosing alternatives: Using Bayesian Networks and memory-based learning to study the dative alternation

In existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian ...

متن کامل

Predicting is not explaining: targeted learning of the dative alternation

Corpus linguists dig into large-scale collections of texts to better understand the rules governing a given language. We advocate for ambitious corpus linguistics drawing inspiration from the latest developments of semiparametrics for a modern targeted learning. Transgressing discipline-specific borders, we adapt an approach that has proven successful in biostatistics and apply it to the well-t...

متن کامل

The Dative Alternation in African American English : Researching Syntactic Variation and Change in a

Recent research has shown the dative alternation in English to be a productive arena for examining the relationship between group-level variation and the internalization of individuals’ grammars. Experimental methods (e.g., Bresnan and Ford 2010) and the analysis of large published corpora (e.g., Bresnan et al. 2007) have revealed subtle cross-dialect differences for this variable. The current ...

متن کامل

The dative alternation in African American English: Researching syntactic variation and change across sociolinguistic datasets

Recent research has shown the dative alternation in English to be a productive arena for examining the relationship between group-level variation and the internalization of individuals’ grammars. Experimental methods (e.g., Bresnan and Ford 2010) and the analysis of large published corpora (e.g., Bresnan et al. 2007) have revealed subtle cross-dialect differences for this variable. The current ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009